This example tweaks a few results to prepare them for publication or whatever, using the dataset from a project comparing autobiographies: osf.io/vwq9p.
Built with R 3.4.1 on August 28 2017
First, be sure splot is loaded:
library(splot)Then load in the data and attach it (can take a little while):
data=read.csv('https://osf.io/ey9hs/?action=download')
attach(data)Texts are split into 100 segments each (the index variable), and are identified by author (the name variable).
To look at a particular category across segments of text, we can just plot the category against index, with name in the by position:
splot(anger~index*name)Quite a mess.
We could arange the legend and get rid of points to clean things up a bit:
#lpos positions, ncol changes the number of columns, and cex['leg'] changes the font size of the legend.
splot(anger~index*name, lpos='top', ncol=4, cex=c(leg=.8), points=FALSE) With this many levels of
by, the legend isn’t very useful in its current form, so we might just remove it.
We’re also interested in non-linear trends over the course of each text, so we’ll switch to loess lines, and we’ll use myl to zoom in (restrict the range) a bit:
splot(anger~index*name, leg=FALSE, points=FALSE, lines='loess', myl=c(-.05,1.7))Much better, but still not very informative.
This project was mostly interested in the Rodger text—how it relates to the Hitler text, and how both of those relate to other autobiographies. As such, it would be nice to know which lines are the Rodger and Hitler texts, and which are (in this case) the highest comparison texts.
To do this, we’ll first have to set up line types and colors outside of splot:
ns=levels(factor(data$name))
nl=length(ns)
disp=data.frame(name=ns, lty=rep(1:9, nl)[1:nl])
disp$col=ifelse(grepl('Hit|Rod', disp$name), '#880101', '#aaaaaa')
disp$lty[grepl('Hit|Rod', disp$name)]=1:2
#now we have an object with names associated with line types (lty) and colors (col):
disp[1:10,]## name lty col
## 1 Adolf Hitler 1 #880101
## 2 Andrew White 2 #aaaaaa
## 3 Annie Besant 3 #aaaaaa
## 4 Anthony Trollope 4 #aaaaaa
## 5 Benjamin Franklin 5 #aaaaaa
## 6 Black Hawk 6 #aaaaaa
## 7 Catherine Spence 7 #aaaaaa
## 8 Charles Darwin 8 #aaaaaa
## 9 Charles Thomson 9 #aaaaaa
## 10 Clarence Darrow 1 #aaaaaa
Now within splot (using the add argument), we can always mark the Rodger and Hitler text, and dynamically display a set of comparison texts.
Here, the mnc part is calculating the mean of each text across segments, then sorting names by their mean and pulling the highest 2. The lowest 2 could be pulled by changing the TRUE to FALSE.
In the first legend call, the constants are position (set in the first argument), and line weight (lwd) and box type (bty; 'n' for none). Dynamic arguments are pulled from the disp object we created before, based on the names in mnc.
splot(anger~index*name, leg=FALSE, points=FALSE, lines='loess', myl=c(0,2.2), colors=disp$col, lty=disp$lty, add={
mnc=names(sort(apply(sapply(cdat$`^^`, '[[', 'y'), 2, mean)[-grep('Hit|Rod', names(cdat$`^^`))], TRUE)[1:2])
legend('topleft', sort(mnc), lty=disp[disp$name%in%mnc, 2], col=disp[disp$name%in%mnc, 3], lwd=2, bty='n')
legend('topright', c('Adolf Hitler', 'Elliot Rodger'), lty=1:2, col=disp$col[1], lwd=2, bty='n')
})If we were planning on applying these settings to several graphs (if, say, we wanted to look at several different y variables individually), we could add the consistent stuff to an options list:
#quote() prevents things from being evaluated outside of the function.
opt = list(
leg = FALSE,
points = FALSE,
lines = 'loess',
colors = disp$col,
lty = disp$lty,
add = quote({
mnc=names(sort(apply(sapply(cdat$`^^`, '[[', 'y'), 2, mean)[-grep('Hit|Rod', names(cdat$`^^`))], TRUE)[1:2])
legend('topleft', sort(mnc), lty=disp[disp$name%in%mnc, 2], col=disp[disp$name%in%mnc, 3], lwd=2, bty='n')
legend('topright', c('Adolf Hitler', 'Elliot Rodger'), lty=1:2, col=disp$col[1], lwd=2, bty='n')
})
)This can save some room at least:
splot(death~index*name, myl=c(0,1.35), options=opt)We might want to look at a few variables at the same time. In this dataset, the difference in pronoun use between the Rodger and Hitler text is interesting, so we’ll look at personal pronouns (i, we, you, shehe, they) and impersonal pronouns (ipron) all together:
splot(cbind(i, we, you, shehe, they, ipron)~name, type='bar', su=grepl('Hit|Rod', name))To better compare each variable between texts, we can put the variables on the x axis with mv.as.x=TRUE, and z-score with mv.scale=TRUE to standardize ranges.
While we’re at it, we’ll also pretty things up with better axis titles, and by removing the subset message (which can be done with sud=FALSE or just by subsetting the data), and changing colors and legend position:
splot(cbind(i, we, you, shehe, they, ipron)~name, type='bar', mv.as.x=TRUE, mv.scale=TRUE,
data=data[grep('Hit|Rod', data$name),], title='Pronoun Usage between Texts', laby='Pronouns (z-scored)',
labx='Category', colors='grey', lpos='topright')Now that we have some formatted figures, we might want to use them in other things.
When it comes to rendering, two factors make for the nicest looking figures: anti-aliasing (making lines less pixelated) and scalability (they won’t become pixelated with zoomed in on; a feature of vector-based formats like pdf, svg, and emf). Using vector-based formats means you can preserve the scale of a plot without losing quality; that is, if you want larger text and more compact elements, you can adjust your plot window and rerender the plot until the proportions look right. Even if the window is very small, you can blow the vector based image up to any size.
For publication, pdfs can be included in LaTex documents, emfs can be included Word documents, and svgs can be included in markdown documents like this one.
For presentation, emfs can be included in PowerPoint slides, and svgs can be included in markdown slides.
The default format when saving from within splot (save=TRUE) is pdf (using cairo_pdf). If you’re using RStudio, you can save the same quality of pdf from the Export dropdown.
To save as anti-aliased emfs, you can install and load the devEMF package (install.packages('devEMF'); library('devEMF')), then just add format=emf to the plot you want to save.
Brought to you by the Language Use and Social Interaction lab at Texas Tech University